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PREFACE 



This technical note presents a number of techniques for promoting 
vectorization in FORTRAN programs to be run on the Cray Research 
CRAY-1 Computer System. Because the CRAY-1 FORTRAN Compiler (CFT) is 
continually being refined, this note will be updated periodically. 
With each update, I hope to increase the number of examples and expand 
it in a few other ways to improve its usefulness. 

I welccane any material that might be included in future editions, such 
as more examples and coding techniques. I especially solicit help 
with Appendix B where many errors of both omission and comission 
undoubtedly lie. In particular, the various table positions that are 
blank indicate that I don't know the proper entry. Special thanks are 
due to the several people who sent me suggestions and examples. In 
particular, a great deal of help for this revision was provided by 
Dick Hendrickson. 

LCH 
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SECTION 1 
INTRODUCTION 



This note describes techniques for helping to vectorize codes written 
for the CRAY-1 Computer and the CRAY-1 FORTRAN, CFT, Compiler System. 
Since the CFT Compiler is continually being refined, some of these 
techniques will become unnecessary. It is primarily intended to aid 
programmers vectorizing existing codes but should aid many programmers 
who are writing new codes to generate vector izable loops. 

Before going further, a caveat is in order. This note presents 
techniques for enhancing the vector izability of codes only; it 
addresses few other methods of increasing program speed. Algorithm 
selection is generally far more important than coding techniques. For 
example, the FFT and various good sorting algorithms are approximately 
N/logN times as fast as the typical simplistic algorithms to perform 
the same tasks (for N input data) , whereas the vectorization usually 
increases speed by a factor of 3 to 6. Thus, for typical dataset 
sizes, these best algorithms or other nearly optimal ones, are orders 
of magnitude faster than poor algorithms. No fancy coding techniques 
can overcome the use of ill chosen algorithms in such cases. Indeed, 
a good algorithm poorly coded is usually preferable to a poor one 
optimally coded. 

Also, this note does not address to -any great extent good programming 
practices which for CFT include (1) using few loops with long code 
blocks in preference to many short code loops* (2) judicious use of 
typing of variables* (3) long loops inside short loops rather than 
vice versa* and (4) if you are trying to get the last little bit from 
a vectorized loop, inserting extra parentheses starting at the end of 
an expression so that operations occur in an order that increases 
chaining. The techniques that are described are presented in six 
groups comprising the remaining sections of this note. 

Sections 2 and 3 present the central issues that must be resolved 
before any useful work can be done, namely (1) finding the time 
consuming portions of the program and (2) circumventing overly modular 
or structured programming techniques. These tasks are so fundamental 
that compiler improvements are unlikely to be of much aid in the 
forseeable future without programmer help in these areas. 

Section 4 discusses recursion, feedback, or vector dependency and 
introduces a simple but very useful and powerful technique; using 
directives which allow the programmer to indicate to the compiler that 
an individual block of code is logically vector izable. Frequently, 
although the programmer knows from the physics of a situation that the 
code is vectorizable, coding in a form that allows the compiler to see 
this may not be convenient. Directives provide a means of pointing 
out vectorizable code to the compiler when this happens. 
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Section 5 shows one way to partially vectorize codes with irregular 
addressing—another anathema o£ vectorization, but one that the 
compiler will work around before too long. 

Section 6 is a pot pourri of "tricks" to improve vector ization that do 
not readily fit into any of the earlier discussions. 

The final section describes removing IF statements r a syntactic 
construct that the compiler will soon handle, at least in some cases. 

Thus, Section 5 and 7 should be of less interest to programmers who 
are not in a big hurry to get the highest speed from their codes. 

As a general note, CFF vectorizes innermost 00- loops only} it does not 
vectorize IF loops. Table 1 lists typical snytactic elements that may 
inhibit vector ization. Except for I/O statements, which often can be 
moved outside of loops after the debug ^ase, I discuss the more 
difficult of these constructs and how the programmer can remove these 
so that CFT will vectorize loops. Although not vectorized in the 
usual sense, unformatted I/O statements which involve arrays are 
processed With vector techniques. 

CFT 1.06, The July 1979 release of CFT, vectorizes loops, with 
constructs in the easy group (table 1} and allows scalar temporaries 
and user-provided (but CAL) functions from the second group. However, 
it inhibits vectorization for a -loop containing other constructs in 
the second group or constructs in the third or fourth group. The 
third group includes constructs that are theoretically vector izable 
but present challenges to the compiler writers. The items in the 
fourth group present a theoretical impossibility ^ that the only real 
hope for vectorizing loops containing them in the near future is to 
break the loop into several loops with the "impossible" construct in a 
separate (scalar) loop. If loops can be recast so that the inner DO 
loops include only items in the first categories, the chance of 
vectorization is enhanced and such loops that do not vectorize with 
CFT 1.06 are more likely to vectorize in the near future. 
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Table 1. FORTRAN inner DO-loop constructs 



Difficulty 



Easy 



Straightforward 



Difficult 



'Impossible' 



Syntactic constructs 

- Long or complicated loops 

- Non unit incrementing of subscripts 

- Expressions in subscript 

- Intrinsic function references 



- Scalar temporary variables 

- Function calls to 
programmer-supplied functions 

- Inner products 

- Logical IF statements 

- Transfer out of a loop (search 
loop) 

- Reduction operations 



Linear recursion 

IF statements 

Some I/O 

Complicated subscript expressions 



Nonlinear indexing 

Complicated branching within a loop 

Ambiguous subscripting 

Transfers into a loop 

Subroutine calls 

Nonlinear recursion 

Some I/O 
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SECTION 2 
FINDING THE CENTRAL PORTION OP THE PROGRAM 

Before spending your time vctorizing parts of the program that do not 
significantly affect the run time, first analyze the program to 
determine where it spends its time. This information may be readily 
available if the code is simple or if there is someone available who 
is familiar with it. Suppose, however, that this is the first time 
you've been faced with this task and that you have never seen the 
program before. 



A typical, well behaved or "nice" 
program has a structure similar 
to that illustrated at right. 
With any luck, you will be able 
to find a similar pattern in 
your program and will be able to 
concentrate on the inner points 
of the program where your efforts 
will significantly impact the 
program's run time. 



INITIALIZATION 



r- 



BOUNDARY POINTS 
INNER POINTS 
DONE? AND I/O 



NEXT CASE? 



If you question the worth of this, look at a typical program and you 
are likely to see many simple vector loops that have been there all 
along. The trouble is, they intialize the grid and are not used for 
any of the computations I Since a problem with a grid of 100 points on 
a side has about 400 boundary points and about 10,000 interior points, 
working on the interior points and ignoring the boundary points — let 
alone the initialization — is clearly worthwhile. Even entirely 
removing the code for the boundary points leaves more than 96% of the 
points in the grid. 

If you cannot discern the general structure, a worthwhile procedure is 
to use the flow analysis option in CFT to get a complete list of the 
subroutine calling tree and the time spent in each of the called 
routines. These figures tell you which routines consume the largest 
fractions of the time, thus it tells which routines are worth looking 
at. Refer to CFT Reference Manual, section 5, for a description of 
the Flow trace option. 
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The flowtrace option adds a substantial overhead to every subroutine 
call and its output is lost if a job fails or executes a CALL EXIT.* 
Thus, if you have a program with many small subroutines, it is 
worthwhile to flowtrace a small case, at least for starters. You 
might also put a test such as the following in your code to stop the 
job after a reasonable length of time: 

IF (SECOND ( ) .GT. 50 ) STOP** 

TO use the flowtrace option (CFT Manual section 5.4.5), put ONaP on 
the CFT statements 

At the end of the run, you will get a table listing the time, percent 
of total time, number of times entered, and average time for each 
routine that is called as well as what routines called it and what 
routines it called. Only calls to FORTRAN programs that are compiled 
by the CFT,ON"P... statement are monitored?, $FrLlB, SSCILIB, SSYSLIB, 
and CAL routines are not monitored nor are the FORTRAN routines 
compiled separately without flowtrace enabled. Because of the great 
difference in execution speed of vectorized code compared to 
non-vectorized code, use of flowtrace is recommended even if you are 
familiar with a program. The timing analysis of flowtrace is 
frequently surprising. 



*EXIT might be the name of one of your routines. Thus, the system 
cannot automatically assume that EXIT terminates a program. If you 
use EXIT to terminate your program, you can still use flowtrace by 
inserting the following subprogram in your deck. 



SOBROOTINB EXIT 
STOP 'EXIT' 
END 



**This is one of many examples of non-standard FORTRAN employed In 
this note. CFT accepts all FORTRAN shown in the examples here. 
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SECTION 3 
GETTING AROUND OVERLY MODULAR OR 
STRUCTURED PROGRAMS 

Assume your code looks like thiss 

DO 31 I - 2,99 
DO 31 J = 2,99 

CENTPT = DATA (I, J) 

PTLEFT = DATA(I-1,J) 

PTRGHT = DATA(I+1,J) 

TEMP = TEMPURTR(I,J) 

TEMPRT = TEMPURTR(I+1,J) 

TEMPLFT = TEMPURTR{I-1,J) 

CALL INTGRTE (CENTPT , PTLEFT , PTRGHT , TEMPRT , TEMPLFT ) 

CALL EQNOST (CENTPT, TEMP) 

DATA (I, J) = CENTPT 

TEMPORTR(I,J)= TEMP 

31 CONTINUE 

Your first impulse probably is to put this away until there are global 
FORTRAN compiler that vectorize messes like this. However, this 
represents a very common situation and is not nearly as hopeless as it 
first appears. However it is hopeless for CFT thus, it is your chore 
to put the DO loops inside the subroutine or, conversely, the 
subroutines inside the DO loops. Putting DO loops inside subroutines 
or the converse operation is probably the most complex part of 
vectorizing codes and is the part that is most likely to be beneficial 
in the long run. The other techniques discussed in this note are more 
likely to be handled by the compiler or by a vector izer someday. 

Putting DO loops inside subroutines probably entails subscripting the 
variable names being passed to the subroutines and passing the entire 
arrays at once. Putting the subroutine in the loop means expanding 
the subroutine code in line in the loop. 

In the above loops, this can be done easily. Perhaps in your problem 
the surface is not flat but is a sphere and so the right and left 
points wrap around at the ends causing nonlinear indexing. Then, you 
will have to try to separate the "bad points" and perhaps use some of 
the techniques suggested in later sections. 
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To illustrate putting DO loops into subroutines and vice versa, 
suppose the subroutines are as follows: 

SDBKODTINE INTCaTE(C,PL,PR,TL,TR) 

COMMON DELTAX,DELTAT,GAMMAI,V,a 

C a c + DELTAT *0.5 * (PL+PR) *DE[,TAX/ (TR-TL) 

RETURN 

END 

SDBROUTINE SQIK)ST(P,T,R) 

COraM^ DELTAX.OELIAT, GAMMAI,V,R 

TP - (P/ (V*R) ) **GAMMAI 

RETURN 

END 

Then the two rewrites of the loop look like this: 

Case 1. Putting loops inside Subroutines. 

The entire DO 31 loop pair replaced by: 

CALL INTGRT7 (DATA,TEMPORTR) 

CALL BQNOSTV (DATA, TEMPURTR) 

Where INTGRTV is a vector version of INTRTE and SQNOSTV is a 
vectorized EQNOST. 

Then , these new subroutines are : * 

SUBHOOTINE INTGRTV{D,T) 
DIMENSION 0(100,100), T(100,100) 
CC»1M0N DELTAX, DELTAT, GAMMAIV, R 
DO 32 I»2,99 
DO 32 J-2,99 

D (I,J) a D(I,J) + DELTAT*0.5*{D(I-1,J)+D(I+1,J)) 
*0ELTAX/(T(I+1,J) - T(I-1,J)) 
32 CONTINUE 
8ETUBN 
END 



♦Reversing the order of the I and J loops would cause an 
unvectorizable dependency; see section 4. 
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33 



SUBROUTINE EQNOSTV (P,T) 

DIMENSION P(100,100), T{100,100) 

DO 33 1=2,99 

DO 33 J=2,99 

T(I,J) = (P(I,J)/(V*R))**GAMMAI 

CONTINUE 

RETURN 

END 



Case 2: Putting the subroutines inside the loops. 

In this case, the DO 31 pair of loop becomes 

DO 34 1=2,99 
DO 34 J=2,99 

DATA (I, J) = DATA (I, J) + DELTAT *0.5* (DATA(I+1,J) 
$ +DATA ( I-l , J ) ) *DELTAX/ ( TEMPURTR ( I+l , J ) 

TEMPUSTR (I , J) = (DATA (I , J) / (V*R) ) **GAMMAI 



34 



CONTINUE 

RETURN 

END 



The second alternative is also especially suitable for functions, 
i.e., expand the code in line as in the next example: 



35 



DO 35 I = 1,1000 

X{I) = AL0G2(y(I)+l)... 



FUNCTION AL0G2 (X) 
DATA CST /.../ 
AL0G2 = CST*ALOG(X) 
RETURN 
END 



36 



DO 36 I = 1,1000 

X(I) = CST*AL0G(Y(I)+1) ... 
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The DO 35 loop will not vectorize because of the call to a routine 
that the compiler doesn't recognize but the DO 36 loop will vectorize. 

One common coding technique is to use vector subroutines such as VADD, 
VMDLT, and so on. The principal part of the program may then look 
like this: 



CALL VADO<A,B,C,M) 
CALL VlffJLT(C,A,E,II) 
CALL VAOO(S,BrA,N) 



Expanding these subroutines- in line and, where possible, combining the 
many DO loops into a few will ensure vector ization and will allow 
intermediate variables to be held in registers rather than being 
returned to menocys 

DO 37 r » 1,M 

A(I) » (B(I) + A(I)) * A(I) + B(I) 
37 CONTINOB 

Presumably, the VADD, VMOLT, etc. vectorize but the DO 37 loop is 
faster because the sum A + B and the product (A + B) * A do not have 
to be stored, but can be kept in a register and A does not have to be 
fetched a second time. Thus, the DO 34 loop is significantly faster 
than the series of calls. 
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SECTION 4 

RECURSION AND DIRECTING THE 

CO^^PILER TO VECTORIZE 

Suppose the key inner loop in your program is like the DO 41 inner loop, 
which doesn't vectorize and is therefore one that you want to spend some 
time on. 

DO 41 I = 1,100 

A(I) = A(I+L) ... 

41 CONTINUE 

If the loop is truly recursive*, the situation may be hopeless. 
However, if the value of L is such that there is no recursion (e.g., if 
L is greater than 1000), the easiest approach is to try directing the 
compiler to vectorize the loop and see if the answers remain the same. 
Placing the following compiler directive in front of the DO loop to be 
vectorized allows the compiler to vectorize a loop that has an apparent 
vector dependency or recursion: 

CDIR$ IVDEP (see CFT manual sections 5.4, 5.4.3) 

In other words, if either real or imagined recursion causes the loop 
not to be automatically vectorized by the compiler, the IVDEP compiler 
directive causes the computations to be done in vector mode. Note, 
however, that if CALL or IF statements or anything besides or in 
addition to apparent recursion prevents vector ization, CDIR$ IVDEP has 
no effect. Also, the effect of" the IVDEP is limited to only the next DO 
loop» a separate IVDEP must be provided for each loop with an ignorable 
dependency} and the IVDEP should immediately precede the DO statement. 

Returning to the example, first try printing some of the A terms the 
first few times through the vectorized loop to assure that vectorizing 
the loop does not change the results. Though this hardly proves that no 
problems can arise, it may help your analysis. This brute force 
approach is inelegant and error -prone, especially in those cases where 
one's insight into the physics of the situation does not provide some 
assurance that the loop is recursion-free. If the value of L differs 
with each pass through the loop, you may find it useful to make a copy 
of the loop with the compiler directive to vectorize, ignoring vector 
dependencies and a copy of the loop without the directive and select for 
use the vector izable block only when it is correct. This means that you 
need to know what values of L are acceptable for vector ization of which 
loops . 



^Recursion is a buzzword used to describe the case where output is 
propagated back into the input. It is explained further below. 
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Some rules of thumb to follow are the following: 



1. Tf the sign of L is the same as the sign of the loop 
increment, there is never any recursion. 

2. tf the sign of L is the opposite of the loop index, 
there is probably recursion. An exception is when 
the loop increment and L have a least common multiple 
larger than the maximum value of the loop index. 

3. There is no recursion of concern if the magnitude of L is 
such that there is no overlap of subscripts between the 
right and left sides of the computation. In fact, if L 
divided by the loop increment is greater than 64, you have 
no worries because the compiler breaks loops into 
64-at-a-time blocks for vector ization. 

If these simple rules do not help, you may have to analyze the problem 
further to determine when you can safely use vector computations. 

"Recursion" is a mathematical term used to describe feedback, a noun you 
may find more familiar and easier to remember. The phenomenon referred 
to is the use of the output of one pass through the loop for the input 
to a computation on a subsequent pass. Consider the following simple 
examples s 



42 



DO 42 I » 1,1000 
X(I) » X(I - 1) + 1 



SUM 3 

DO 43 I » 1,1000 

43 SUM » SUM ■•- X{I) * Yd) 



In the code on the left, the value of X(l) is used to compute X(2); X(2) 
is used to compute X(3), and so on. If this were done in vector mode, 
all of the X terms would be fetched at once, 1 would be added to each of 
them, and only the first value would be known to be cortect. In the 
code on the right, the value of SUM is used for each subsequent pass 
through the loop. Inserting a COIRS rVDEP would probably produce wrong 
answers in the DO 42 loop and would have no affect on the 00 43 loop 
because the reason for scalar mode of the DO 43 loop is loop collapsing, 
as well as recursion. 

The following loops are similar to these but are nonrecursive: 



44 



DO 44 I = 1,1000 
X(I) » X(I + 1) + 1 



45 



DO 45 I « 1,1000 

A(I) » A(I) + X(I) * Yd) 



In the DO 44 loop, no X value is reused after being computed so there is 
no feedback. Similarly, the DO 45 loop is not recursive because no A 
value is reused after being generated; it is merely stored. 
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Tti understand the effects of recursion on vector izat ion, it is important 
to realize that vector ization is an essentially parallel computation on 
a group of values. Consider the simple cases 

DO 46 I = 2,3 
A(I-l) = 3.0 
46 B (I) = A(I) 

Biis loop is equivalent to the sequential statements. 

A(l) = 3.0 

B(2) = A(2) 

A(2) = 3.0 

B{3) = A(3) 

Vector izat ion, in effect, reorders the sequence to: 

A(l) = 3.0 

A(2) = 3.0 

B(2) = A(2) (Now 3.0) 

B(3) = A(3) 



and the "vectorized" sequence probably produces different results. 

Wienever CFT encounters a loop which might be recursive, it generates 
correct scalar code rather than fast and possibly incorrect vector code, 
because vector and scalar versions of a recursive loop generally produce 
different results. 

Recursion can cause problems if numerically equal subscript values occur 
on different passes through a DO loop and at least one of them is on the 
left of the equal sign. 

There are two general classes of recursion: 

1. A value is prematurely destroyed if vectorized. The preceding 
is an example of this. The loop can often be made 
non-recursive by reordering the statements or by using 
temporary storage. 



47 



DO 47 I = 2,3 
B (I) = A (I) 
A(I-l) = 3.0 



48 



DO 48 I = 2,3 
TEMP = A(I) 
A(I-l) = 3.0 
B(I) = TEMP 
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2. A value is not ready when needed. This is the one-line 
recursion relationship: 

DO 49 I » 2,3 
49 A(I) 3 B*A(I-L)+C 

Because of the group computation in vector mode, both input A values 
(the group A(l), A(2)) are used to compute the output value group A(2) 
and A (3). However, in this case the original A (2), not B*A{1)+C, is 
used to compute A (3), a probable error. 

In many cases, CPT is not able to determine whether or not a subscript 
leads to recursion. For examples 

DO 410 I » 1,10 

410 A(I,J) » A{I-1, JPLOSl) 

is recursive if J is ever the same as JPLUSl. If the programmer knows 
from the physics of the situation, for example, that J. and JPLUSl will 
never be the same , then e»it ' 

COIRS rVDEP 
is appropriate. Alternatively, the loop could be rewritten as 

DO 411 I a 1,3 ' 

411 A(I,J) - A(I-1,J+1) 

and CFT would automatically vectorize it. In general, it is an aid to 
vector izat ion if subscripts can be explicity written out. For examples 

DO 412 I » 1,3 

412 A(I) - A(I+N) 

In many cases, N is not really a "variable"? it has a constant value and 
often never even changes from run to run. A "variable" is used simply 
to provide some flexibility in case the problem ever changes. Rather 
than initialize N with 

DATA N/3/ 
or N s> 3 
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it is much better to use: 

PARAMETER (N = 3) 

and CFT would automatically vectorize the sample loop. 

In the following illustrations of recursive and non-recursive loops, 
assume that X(I) = 21, Y(l) = -I, and Z{I) =0 before the codes are 
run. The final values of X are given after the loop for subscripts = 



Recursive : 

DO 413 I = 2,5 
X(I) = X(I - 1) + 1. 
413 CONTINUE 

X = 2, 3, 4, 5, 6 



Non-recursive : 

DO 415 I = 2,5 
X(I) = Yd - 1) +1. 

415 CONTINUE 

X = 2, 0, -1, -2, -3 
DO 416 I = 1,5 

X(I) = X(I) + 1. 

416 CONTINUE 

X = 3, 5, 7, 9, 11 

DO 417 I = 1,5 

X(I) = Z(I) + X(I) * Y(I) 

417 CONTINUE 

X = -2, -8, -18, -32, -50 



CDIR$ rVDEP 

DO 414 I = 2,5 

X(I) = X(I - 1) + 1. 

414 CONTINUE 

X = 2, 3, 5, 7, 9 



the last four of which are bad 

values because of forced 

vector izat ion of a recursive loop. 



The compiler vectorizes the 
DO 415 loop automatically. 



The compiler vectorizes the 
DO 416 loop automatically. 



The compiler vectorizes the 
DO 417 loop automatically. 
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413 



L - 10 

DO 418 I » 1,S 

X(I) » X{I + L) 

CONTmUE 

X =» 22, 24, 26, 28, 30 



L « 10 
COIRS IVDEF 

DO 419 I » 1,5 
X(I) » X(I + L) 
419 CONTINUE 

X » 22, 24, 26, 28, 30 



Here, the 418 loop does not vectorize but the 419 loop does. The 
compiler does not know that L is not negative. 



Hecursivet 

L - -1 



L « -1 



DO 420 I - 1,5 


CDIRS IVOEP 


X(I) - X{I + L) 


DO 421 I » 1,5 


CONTINUE 


X(I) » X(I + L) 


X a 0, 0, 0, 0, 


421 CONTINUE 




X > 0, 2, 4, 6, 3 



420 



The last four are incorrect 
because o£ forced vectoriza- 
tion of a recursive loop. . 

Here, the value of X(l) is fed back to compute X(2), i.e., the loop is 
recursive and the vectorized version of Uie loop produces wrong 
results. In scalar mode, the computations proceed... 

(«0 by assumption) 
{»0 from last computation) 
(aO from last computation) 
(=>0 from last computation) 
(»0 from last computation) 



xd) 


» X(0) 


X(2) 


» X(l) 


X(3) 


- X(2) 


X(4) 


- X(3) 


X(S) 


» X(4) 
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CFT generates code that executes as above because its approach to 
vectorization is conservative. When forced to vectorize, the loop 
executes : 

X = shifted X= (0, 2, 4, 6, 8) 

so that X(2) = original value of X(l), not the just-computed value» 
similarly, X(3) = original X(2), not the newly computed value, etc. 
Also, in this exeunple it is assumed that is a legal subscript, i.e., X 
is declared DIMENSION X(0:50}. 

Many examples of recursion are of the following form where L is negative 
(that is, opposite in sign to the increment of J, which is 1 here) and K 
is positive (of the same sign as the increment of I in this example) : 

DO 422 I = 1,100 
DO 422 J = 1,100 
422 A(I,J) = A(I + K, J + L) ... 



Here, by inverting the order of the loops, you can remove the recursion 
and allow vectorization by using the CDIR$ IVDEP directive. This type 
of loop order inversion is frequently too complex to analyze easily and 
you may need to go back to the physics of the situation or to that 
unfortunate alternative of printing gobs of values to determine a 
reasonable way to reorder or rewrite the code. 

The following examples show a simple but real case where the compiler's 
overly conservative attitude is easy to see and correct. Case 1 runs 
about four times slower than Case 2. The cause of such a large speed 
increase is the complexity of the loop. Loops with very few 
computations generally have less speed up. 
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CASE 1 



NLl » 1 
NL2 » 2 



00 423 KX ' 2,3 

DO 423 K? - 2,21 

KJl » 01CKX,K3r + 1,NL1) - ai(KX,IK - 1,NL1) 

002 « 02(KX,IC? + IrNLl) - 02{KX,Ky - 1,NL1) 

DU3 » a3(KX,K2r + 1,NL1) - 03{KX,inr - 1,NL1) 

01(KX,KY,!ir,2) » Dl(KX,KY,NLl)+All*Dta+A12*D02+Al3*D03 

$ +SIG*(Ul(KX-H,KY,NU)-2,*ai(KX>inr,SLl)+Ul{IDC-l,KY,NLl)) 

a2(KX,Kr,NL2) » a2(KX,Ky,NLl)+A21*DtJl+A22*D02+A23*DU3 

$ +SIG*{U2(KX+l,inr,NLl)-2.*02(ICXjiar,NLl)+U2(kX-l,irir,NLl)) 

a3(KX,KY,NL2) » 03 (KX,inr,NLl) +A31*DOl+A32*002+A3 3*003 

$ +SIG*(a3(KX+l,KY,NLl)-2.*03(KX,Ky,NLl)+a3(KX-l,Ky,NH)) 

423 anrriNos 



The values of NLl and MI.2 are swapped before the next pass through loop. 
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CASE 2 



DO 424 KX = 2,3 

CDIR$ IVDEP 

DO 424 KY = 2,21 

DUl = 01(KX,Ky + 1,NL1) - U1(KX,KY - 1,NL1) 

DU2 - U2(KX,Ky + 1,NL1) - 02(KX,KY - 1,NL1) 

D03 = D3(KC,KY + 1,NL1) - 03(KX,Ky - 1,NL1) 

U1(KX,KY,NL2) = 01 (KX,KY,NL1) +A11*D01+A12*DU2+A13*D03 

$ +SIG*(Ul(KX+l,KY,NLl)-2.*Ul(KX,KY,NLl)+al(KX-l,KY,NLl)) 
U2(KX,KY,NL2) = U2 (KX,Ky,NLl)+A21*D01+A22*DU2+A23*DU3 

$ +SIG*(U2(KX+1,KY,NU)-2.*02(KX,KY,NL1)+02(KX-1,KY,NL1)) 
U3(KX,KY,NL2) = 03 (KX,KY,NLl)+A31*DOl+A32*DU2+A33*D03 . 

$ +SIG*(03(KX+1,KY,NL1)-2.*U3{KX,KY,NL1)+U3{KX-1,KY,NL1)) 
424 CONTINUE 



I hope these examples shed some light on this rather abstruse topic. 
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SECTION 5 

IRREGULAR ADDRESSING 

Irregular or nonlinear addressing arises in situations such as those 
using data structures requiring subscripted subscripts. Subscripted 
subscripts do not occur explicitly in PORPRAN-66 code but may 
effectively occur in certain types of programs as below: 

In the DO 51 loop, Y essentially has a subscripted subscript 

DO 51 I = 1,1000 
J = INDEX (I) 
X(I) = Y(J) ... 

51 CONTINUE 

Change to: 

DO 52 I = 1,1000 
J = INDEX (I) 

52 TEMP(I) = y(J) 
DO 53 I = 1,1000 

53 X(I) » TEMP(I) ... 

The DO 51 loop cannot vectorize with CFT 1.06 because of the nonlinear 
indexing. The DO 52 loop similarly doesn't vctorize but the DO 53 loop 
does and, if the computations are extensive, the speed-up can be 
dramatic. 

In general, if the computations are sufficiently complicated to warrant 
the work, you can restructure the loop into two or three loops. The 
first new loop is a GATHER loop in which all the data to be manipulated 
are collected into vectors. Next is the computation loop. Then is the 
SCATTER loop, in which results are distributed from the vector used in 
the computation loop to their proper locations. Quite often, as in this 
example, there is no SCATTER loop. There are $SCILIB routines for doing 
the GATHER and SCATTER (see ^pendix A and CRI Manual 2240014) . 

The following example illustrates this again for a particle pushing 
algorithm. 
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CASE 1 

DO 54 K a> 1,150 

IX « GRD(K) 

XI « IX 

VX(K) » VX(K) + EX(IX) + {XX(K) - XI) * DEX(IX) 

XX (K) » XX(K) + VX(K) + PLX 
C IX ISr IN EFFECT, A SUBSCRIPTED SUBSCRIPT 

IR » XX (K) 

RI " IR 

HXl » XX(K) - RI 

IR « IR - (IR/64) * 64 

XX (K) » RI + ioa 
C IR IS AN IRIiEGULAR SUBSCRIPT 

HH(IR) » HH(IR) + 1.0 - RXl 

RH(IR + 1) » HH(IR +1) + HXl 
54 CONTINUE 
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CASE 2 

DO 55 K = 1,150 
IX = GRD(K) 
XIV (K) = IX 
EXC(K) = EX (IX) 
DEXC(K) = DEX(IX) 

55 CONTINUE 

C GATHER LOOP ABOVE 
C XI IS VECTORIZED INTO XIV 
C EX IS GATHERED INTO EXC 
C DEX IS GATHERED INTO DEXC 

DO 56 K = 1,150 

VX(K) = VX(K) + EXC(K) + (XX (K) - XIV (K)) * DEXC(K) 

XX (K) = XX (K) + VX(K) + FLX 

IRV(K) = XX(K) 

RI = IRV{K) 

RXIV(K) = XX(K) - RI 

XX (K) = RI + RXIV(K) 

56 CONTINUE 

C COMPUTATION LOOP WHICH VECTORIZES IS ABOVE 
DO 57 K = 1,150 

RH(IRV(K)) = RH(IRV(K)) + 1.0 - RXlV(K) 
RH(IRV(K) + 1) = RH(IRV{K) + 1) + RXIV(K) 

57 CONTINUE 

C SCATTER LOOP 

The code in Case 2 runs more than twice as fast as that in Case 1. 
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SECTION 6 

MISCELLANEOUS TECHNIQUES 

This section includes a group of examples that do not readily fit into 
the categories discussed above. In some sense, this is a 
bag-of- tricks chapter demonstrating several additional loop 
restructuring techniques as well as all multi-loop techniques. The 
techniques here are harder to describe in a general and systematic 
fashion. 

A matrix multiply represents an algorithm that can benfit from loop 
restructuring. For example, the following code illustrates the common 
way of coding the matrix multiply: 

DO 61 I = 1,L 
DO 61 J = 1,M 
C{I,J) = 0.0 
DO 61 K = 1,N 

61 C(I,J) = C(I,J) + A(I,K) * B{K,J) 

The recursion on C(I,J) and loop collapsing prevent vectorization now 
(CFT 1.06) and will always prevent as full vectorization as the rewrite 
below. This rewritten code vectorizes fully, resulting in a speedup of 
5 to 10 times; 

DO 63 I = 1,L 
DO 62 J = 1,M 

62 C(I,J) = 
DO 63 K = 1,N 
DO 63 J = 1,M 

63 C{I,J) = C(I,J) + A(I,K) * B(K,J) 

In many similar situations, although the result is not going into a 
subscripted variable but into a scalar temporary you can reorder the 
loops and store the results as a vector temporary instead of as a scalar 
temporary. 
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The next example shows several stages in the speed-up process. Case 2 
is more than 50% faster than Case 1 and Case 3 is almost four times as 
fast as Case 1. 

CASE 1 

Q » 0.0 

00 64 K » 1,996,5 

Q - Q ■»• 2(K) * X{K) + 2(K + 1) * X{K + 1) 
$ + 2(K +2) * X (K + 2) + Z(K + 3) * X(K +3) 
$ + Z(K + 4) * X(K + 4) 

64 CONTINDE 

In this original case, the loop was quintupled, presumably to cut loop 
overhead or allow greater overlap of operations. 

CASE 2 

DO 6S K a 1,996,5 

TP(K) a 2(K) * X(K) + Z(K + 1) * X(K + 1) 
$ + Z(K + 2) * Z(K + 2) + Z{K + 3) * X{K + 3) 

$ •»• 2(K + 4) * X(K + 4) 

65 CONTINUE 
Q « 0.0 

DO 66 K » 1,996,5 
Q " Q + TP(K) 

66 CONTINDE 
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CASE 3 

Q = SDOT(1000,Z,1,X,1) 

Here, SDOT is the BLA single-precision dot function (see Appendix A or 
CRI publication 2240204). 

As an aid to remembering the calling sequences for the basic linear 
algebra functions, the first argument is the vector length, the 
remaining arguments are in pairs; a vector operand followed by its 
increment in memory. 

Thus, if A and B are declared DIMENSION A{M,N),B(N,L) and you want to 
compute the dot product of the Ith row of A with the Jth column of B, 
uses 

AB = SD0T(N,A(I,1),M,B{1,J),1) 

where N = the vector length = number of elements in each vector operand, 
A(I,1) and B(1,J) are the starting locations in memory of the operands, 
M = memory increment of the first operand vector and 1 = memory 
increment of the second operand vector 

Appendix A lists the BLA subroutines briefly as well as a few other 
useful routines that are in $SCILIB. 

A planned enhancement to CFT is to perform scalar operations for 
individual statements in otherwise vector izable loops. An industrious 
programmer can achieve this now by using VFUNCTIONSs 

CDIR$ VPUNCTION ... 

which tells the compiler of external non-libray vector functions. For 
CFT 1.06, these can be written only in CAL. The CFT manual sections 5 
and Appendix F provide the information necessary to link such routines 
to FORTRAN programs. 
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SECTION 7 

RH40VING IF STATEMENTS AND USING 

BUILT-IN FUNCTIONS 

CFT 1.06 does not vectorize code blocks that contain IF statements. 
Many types of loops with IF statements are not hopeless, however. 
Several things can be done depending on the structure of the code. CFT 
will eventually vectorize many of these for you but in the meantime you 
can help by using some of the built-in functions such as AMAXl, ABS, 
CVMGT, CVMGZ, ...etc. (See CFT manual appendixes B and C). For examples 

DO 71 I = 1,1000 

IF{A(I) .LT. 0.) A(I) = 0. 

71 B{I) = SQRT (A(I)) ... 

which can be converted to: 

DO 72 I = 1,1000 
A(I) * AMAX1(A(I),0.) 

72 B(I) = SQRT (Ad)) ... 

The DO 71 loop doesn't vectorize nowj the DO 72 loop does. 

All the built-in arithmetic functions in FORTRAN (in $FTLIB) have both 
vector and scalar versions » the compiler calls the vector version for 
vector izable loops*. The vector merge operations, CVMG*, are typeless 
functions that allow you to merge the results of different vector 
computations such as the following s 

DO 74 I » 1,1000 
IF(A(I) .LT. 0.) GOTO 73 
B(I) = A(I) + D(I) ... 
GOTO 74 

73 B(I) = A(I) * E(I) ... 

74 CONTINUE 

This can be rewritten to vectorize as 

DO 75 I = 1,1000 

75 B(I) = CVMGT (A(I) * E(I) ..., (A(I) + D(I) ..., A(I).LT.O) 



♦Some are actually pseudo vector routines; they allow the loop to 
vectorize but are performed in scalar mode. 
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The mnemonic for the C7MG* group of functions is that the last letter of 
the name is the condition on which the first argument is used. Since 
these functions are Boolean, they can be used with integer or floating 
operands and results and in scalar loops as well as in vector loops. 
Thus, if you are not sure that you are computing the value of B 
correctly, you can put a print statement in the loop, which causes it to 
be scalar, and still obtain the same results, albeit much slower than 
before. 

Table 2 lists the merge functions and some of the other ones that you 
may want in similar situations. 



Table 2. 
FUMCTIOK NAME 

AMAX0(Xi,X2...) 
AMftXl(Ii,l2...) 

MAX0{Xi,X2...) 

MAXl{Ii,l2...) 
CVMGT(X,y,L) 

CVM6Z(X,Y,Z) 



Some typical in line CPT functions. 

RESCLT TCTE ARGOMEMT TYPES OPERATION 

Real Real Largest X]^ 

Real Integer 



Integer 



Integer 

Boolean 
(single word) 

Boolean 
(single word) 



Real 



Integer 

Boolean 
(single word) 

Boolean 
(single word) 



Largest li, 
floated 



Largest Ki, 
truncated 



Largest l]^ 

X if L True, 
otherwise y 

X if Z is zero, 
Y if Z is nonzero 



Another technique that works in sane cases is inverting the order of 
loops so that the IP statements are in the outer loops rather than in 
the inner loops. Also, if the purpose of the. IP test is to separate an 
exceptional case from other cases and if the computation is extensive, 
it may be worthwhile to write a loop to do the testing and to write a 
vectorizing loop -for the computations. 

Here are some more examples: 

Yd) - 1.0 

IP(X(I).EQ.O.) GOTO 76 
Y(I) » 1.0/X(I) 
76 CONTINDB 

Change this to: 

Yd) » 1.0/CVMGZ (l.,X(I),Xd)) ... 
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which allows a loop containing it to vectorize and yet does not cause a 
divide fault. Here, CVMGZ selects 1. when X(I) = 0; otherwise it 
selects X(I). Alternatively, if this exceptional condition only occurs 
in cases when the result is not used, you can surround the loop 
containing it with CALL CLEARFI and CALL SETFI to turn the floating 
point interrupt off and then on again. This allows generation of an 
infinity without interrputing the program. 

The next example illustrates loop reordering and IP statement removal: 

CASE 1 CASE 2 



77 



78 



79 



DO 77 K = 1,3 

FR(K) = 

CONTINDE 

DO 79 JA = 1,500 

IF (JA .EQ lA) GOTO 79 

DS =« 

DO 78 K = 1,3 

A(K) = RS(K,IA) - RS{K,JA) 

DS = DS + A(K) ** 2 

CONTINUE 

DS = SQRT(DS) 

IF (DS .GT. RAD) GOTO 79 



C 
710 



CONTINUE 



711 
712 



713 



DO 710 JA = 1,500 

DSV(JA) = 

DSV IS A VECTOR OF DS VALUES 

CONTINUE 

DO 712 K = 1,3 

DO 711 JA = 1,500 

AM(K,JA) = RS(K,IA) - RS{K,JA) 

DSV(JA) + AM{K,JA) ** 2 

CONTINUE 

DO 713 JA = 1,500 

DSV(JA) = SQRT(DSV(JA)) 

CONTINUE 

DO 714 JA = 1,500 

IP (DSV(JA) .GT. RAD) GOTO 714 



714 CONTINUE 



Some of the loops in Case 1 vectorize but this vector length is only 3. 
In Case 2, the inner loops vectorize with a vector length of 500. In 
particular, the 713 loop uses a vector square root saving a great deal 
of time. As it turns out, in the "real life" example, most of the time 
DS was greater than RAD (last statement shown in the loop) so the rest 
of the loop did not need any work. Even though additional vector ization 
could be done, it would not have been very productive. With the change 
illustrated, the entire kernel ran more than four times faster than the 
original. 
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APPENDIX A 

$SCILIB SUBROUTINES 

This appendix summarizes the scientific library subroutines. 

For a current and more complete description of these functions, refer to 

the Library Subroutine Reference Manual, CRI Publication 2240014. 

LEGEND: 

N Vector length 

X,Y Floating point vectors 

IX, lY Increments in memory of floating point vectors 

C,D Complex vectors 

IC,ID Increments in memory of complex vectors 

NB Number of bits per word selected for PACK/UNPACK 

NW Number of words in unpacked array. 



Name (Parameters) 
ISAMAX(N,X,IX) 

ICAMAX(N,C,IC} 

SASUM(N,X,IX) 
SCASUM(N,C,IC) 

SAXPY(N,X,IX,Y,IY) 
CAXPY(N,C,IC,D,ID) 
SCOPY(N,X,IX,Y,IY) 
CCOPY(N,C,IC,D,ID) 



Type 

Integer function 

Integer function 

Real function 
Real function 

Subroutine 
Subroutine 
Subroutine 
Subroutine 



Purpose 

Index to real array element 
having maximum absolute value 

Index to complex array 
element having maximum 
modulus . 

Sums the absolute value of a 
real array 

Sums the absolute values of 
real and imaginary parts of 
complex array 

Performs vector computations 
y-*-ax+y on real arrays, x,y. 

Performs vector computation 
y-<-€Uc+y in complex arrays x,y. 

Copies real array x into real 
array y. 

Copies complex array c into 
complex array d. 



2240207 



A-1 



Name (Parameters) 
SDOT(N,X,IX,Y,IY) 

CDOTC{N,C,IC,D,ID) 

CDOTa(N,C,IC,D,ID) 

sina!2(N,x,ix,Tr,iY) 

SCinW2{N,C,IC) 
SROT(N,X,IX,y,IY) 

SROTG(...) 

S»OTM(...) 

SB0TM6(...) 

SSCAL(N,A,X,IX) 

CSSCaL(N,A,C,IC) 

CSCAL<N,A,C,IC] 

SSWAP(N,X,IX,Y,IY) 
CSWAP(N,C,IC,D,ID) 
MXMA(...) 

CFFT2 (...) 

RCrPT2 (...) 

CRFFT2 (...) 



Type 

Seal function 

Complex function 

Complex function 

ReiU. function 

Real function 

Subroutine 

Subroutine 

Subroutine 

Subroutine 

Subroutine 

Subroutine 

Subroutine 

Subroutine 
Subroutine 
Subroutine 

Subroutine 

Subroutine 
Subroutine 



Purpose 

DOT product of real arrays 
x,y. 

DOT product of complex arrays 
c,d. 

DOT product of complex arrays 
c,d. 

Euclidean norm of real array 

X. 

Euclidean norm of complex 
array c. 

Performs Givens 
transformation on real arrays 
x,y. 

Calculates parameters for 
SBOTX. 

Modified Givens 
trcuisformation. 

Sets up parameters for 
modified Givens. 

Rescales real array x: 
x-^cuc, a real. 

Rescales complex array c: 
c-^ac, with a real. 

Rescales complex array c: 
c-^ac, with a. complex. 

Swaps real arrays x,y. 

Swaps complex arrays c,d. 

Completely general matrix 
multiply. 

Fourier transforms binary 
radix complex array. 

Fourier transforms binary 
radix real to complex. 

Fourier transforms binary 
radix complex array to real. 
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Name (Parameters) 
PACK(P,NB,U,NW) 



MINV (...) 



CSUM(N,C,IC) 



FILTERG (...) 
FILTERS (...) 
OPFLIT (...) 



Type 
Subroutine 



UNPACK (P,NB,U,NW) Subroutine 



Subroutine 



SSUM(N,C,IC,D,ID) Real function 



Complex function 



CROT(N,X,IX,Y,IY) Subroutine 



CROTG(N,C,IC,D,ID) Subroutine 



Subroutine 



Subroutine 



Subroutine 



Purpose 

Packs power of 2 bit partial 
word lists. 

Unpacks list into power of 2 
bit partial words. 

Returns solution of general 
linear equation set, matrix 
inverse optional. 

Sums the elements of a real 
array. 

Sums the elements of a 
complex array. 

Applies complex Givens 
rotation. 

Sets up rotational parameters 
for CROT. 

Performs general fitering and 
auto-correlation . 

Calculates symmetric filter 
coefficient. 

Wiener-Levinson equation 
solver . 
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APPENDIX B 

FORTRAN DIALTCTICAL DIFFERENCES 

As an aid for conversions, this appendix contains a number of tables 
that compare the FORTRAN compiler dialects for several manufacturers. 
The following tables are included. 

1. Hardware dependencies 

2. Coding features 

3. Declaratives and ordering 

4. Names and variables 

5. Constants, literals, and strings 

6. Arithmetic and expressions 

7. Branching and control statements 

8. Input/output formatting 

9. Subroutines and functions 

10. Instrinsic or inline functions 

11. External functions. ,_ 

All of these tables are based -on a scan of memuals and are thus prone to 
error and very prone to omissions. Further, as manufacturers bring 
their FORTRAN dialects into conformance with FORTRAN X3. 9-1978, some of 
these differences Ceui be expected to disappear. The tables also reflect 
1975-1979 versions of FORTRAN. In particular, while CDC FTN 5 is to be 
an ANSI-1978 version here. CDC means either FTN 4 or RUN FORTRAN. IBM 
means FORTRAN H. Univac means either FORTRAN V or ASCII FORTRAN. ICL 
means either 1900 or 2900 FORTRAN. The tables do not generally include 
any FORTRAN statement, syntax, or pecularity where CFT is believed to be 
at least as general as the other dialects shown. "Unusual" and its 
synonyms mean "non-CFT." 
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TABLE 2 COPTNG F EATURES 
ANSI-t6 AMSt-77 CDC IBM 

C C,« C|«,$ C 



UN I VAC IC 

Ci« in line C 
for rest of 
line 



HONEYWELL 
c," 



Continuation allowed 
(line lengthi cords) 

Hultlple statements per 
line, separator isi 



1> 



Not allowed 



Not allowed 



Not allowed 



1» 



Multiple replacement statement Not allowed Not allowed Not allowed Yes 



1320 chars 
total 



19 



Not allowed ( for coHnent Not allowed 



Not allowed Not allowed 19001 Yes 

2900* No 



Not allowed 



I 



PROGRAM statement 



Pseudo functions (functions 
usable on 'left or right of >)| 

must be receded without 

pseudo functions. 

Partially reserved words 
some names may be illegali 
O. ..means names beginning 
with O. 

Unusual characters allowed 



Defines prog- 
ram name 

Wot allowed 



FORMAT 
END 



Not allowed 



Defines Defines files 
program name 



Not allowed Not allowed Not allowed BITS 

SUBSTR 



FTNi FORMAT 

FUNCTION 

RUNi CAU.,END 
0...,etc. 



t in CALL 
• in HS I/O 
S in name 



Name only 
allowed 

Not allowed 



( In CALL and 
for concaten- 
ation 

■ in MS IA> 
$ in 



» in CALL 



\ for •* 

t for continu- 
ation 

• in MS I/O 

I for statement 
terminator 



K> 
■Ik. 
O 
Ki 
O 
-4 



in 
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declaratlvaat axacutablas 
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roc RUN and UN I VAC rOKTUAN V 
Miat cocKact typaa of cona- 
tantt In DAT* atataawnta. 

COHMQN Irtagularlitaai 

Initial conaion block langtha 
auat ba as lon^ ai avac 
fcquUad. Huabcrad conaian 
■tuit ba changed to naaiad. 



££1 


AHS|-t« 


linizll 


cec 


i£iS 


UNiyAC 


JC], 


IWeWBil, 


NO 


No 


No 


No 


yaa 


Yaa 


y«a 


Yaa 


yaa 


Hot allouad 


Not allowad 


Mot allowad 


yaa 




yaa 




Ho 


Ma 


Ho 


yaa 


No 


Yaa 




Yaa 


Nona 


Nana 


Charactar 


ECS 




Charactar 


Qiaractar 


Charactar 
Abnoraul 


Ho 


Ho 


Mo 


y«a 


NO 


NO 


Ho 




Mo 


Nat Allouad 


Ho 


ITHt Ho 
«UN< yaa 











Not allo«fad yaa 



Yaa Not allowed Yaa 



Convartcd Nat allowad Convartad 

axcapt for aacapt for 

logical) logical 

coavlaa charactar 



Yaa 



MINI ignorad 



Nuatbar^ad Nona 
COMHON blocka 
changing 
langtba of 
COHHOH 



Ai Convatt 
V( Ignora 



Nona 



to 
KJ 
■Ik 

o 

M 
O 

-J 



Haxinun number of characters 
In iwRieB 



t In 



Lower case characters ' 



ANSI-t6 
i 



TABLE « NAMES OF VARIABLES 
AMSl-77 CDC IBM 

6 7 6 



Not allowed Not allowed Not allowed Not allowed Yes 



UHIVAC 
6 



32 



Atter Initial Not allowed Not allowed 
letter 



Not allowed Not allowed Not allowed Not allowed Not allowed Treated as Not allowed Treated as 

upper case upper case 



to 

I 



Item err 

Non CFT types None 

Quad precision change to 
double and double to 
single, probably. 



Nonstandard length identifiers 
(such as R£AD*D for REALMS) 
Double precision and Quad 
precision functions nust be ^ 
changed to correspond to 
actual argunent types where 
these are changed. 

unusual constant forsis 



Alternate character codes None 
(See also third hardware Itea) 



TABLE S COMSTAMTS, LITEBALS. AND STRINGS 
ANSl-66 ANSI-77 CDC IBM 

None Character Mot allowed 



None 



Quad pre- 
cision 
double- 
coaplex 



UNI VAC 

Quad pre- 
cision 
double- 
conplex 
character 

Not allowed 



ICL 

Quadruple 

double-comp. 

character. 



I,J,K,L,H,N, 
B.n.Q.D. 



Not allowed 


Not allowed 


FTNi ...B 


l... toe Hex 


0... 


1900 lO... 


O. .. 






RIINt ...B 


Data init- 


for octal 


for octal 


for octal 






or 0... 


ialisation 




2900iZ... 








for octal 


only. 




for Hex 





Not Specified Not Specified Display code EBCDIC 



Vi Field data 



O 



TMLg i *mTiiHeT|c wn> gufHEssioMS 



lUS £££ 

l/J/X - rUlAT|l/J|/X «•■ 

roc CDC Sublet tpt (t| c*n 
b* jiiMed in arlthawtlc 
■t*t«icnts for (ubtodptad 
«<(Ubl«* If on* It mt pttunt. 



Nut •! lowed 






£fiC 
Ma 



J£!l 



umvitc 



yc« 



U *•-( ok |«nd otlMC ■lallat 
on* ft 


Not allowed 


Not allowad 


Not allowed 


Not allowed 


Not allowed 


Ai No 
Vi «as 


Hot allowed 


Subscdpii (cqutrcd foe 
■ultl-diwn*ion*l «c[«ya 




At l**«t on* 


Ml 


All 


None 


All 


All 




Man-inte^cal subscript* 




Not allowed 


Hot allowed 


Hot allowed 


mil Vea 

MINI No 


V*ai teal 


«*■ 


V«ai real oi 



to 
I 



Result* of out-of-range cob- Fall through 

puted CO TO. 

AittbMetlc IF can have alsalng Not allowed 
•tateoent label*. 

ArltliBctlc ir can have coaplex ' Not allowed 
arguaent 



00 1 I • 10,1 esecute* No tius 

nil* 1* a difficult error to 
aonltor unless IP's are ln**rt*d 
for all DO stateiKnt* but it can 
be fixed with OH*J on the crr stat*a*nt 
Coaples relational operator. ,NB. and 

CoB4>l*x .EQ. only 



iJibelB on non-executable 
non-rOHMKT statenent* 



Nat allowed 



T*Bl.g ^ amMCIIINC ANP CONTjlOI. «T*TeME^<TS 
midii mhH £&C I£it UNIVAC JCL HOMtYWEI.L 

Nut allowed Pall through ratal error Pall through Pall through Pall through Petal error 



Nat allowed Not allowed 



Nut allowed Ho tlae* 



Not allowed 



Not allowed 



.N£. and 
.eo only 



Once 



OKi only 
real part 
te*t*d *HC*pt 
.NE, and .EQ. 



tubals allowedi 
Reference not 
allowed 



Once 



Hat allowed 



Hot allowed 



yes, Cora cen ItOOt yes Yes 
also a»00i No 



Not allowed Not allowed ye*i only Not allowed Not allowed 

real part 
tested 



A* if Once 

m 1 1-10, 1,-1 



Hot allowed 
can and 00. 
Vi ye* 



Al FOHMAT 



Not allowed 



M 

o 
o 



(9 



Item 
Free format I/O 

End-oC-file, erior checks 

TAPE, I/O TAPE, etc. allowed 
with R/W statements 

Random mass stocage I/O 
Other I/O statenents 

None CFT FORMAT specs 

Fornat paren nesting naxlBun 
Encode/Decode pecularltiea 



CFT 

Not allowed 



ENO>, ERR-, 
EOF, UNIT, 
I EOF 



Hot allowed 



AMSI-66 
Not allowed 



GETPOS 
GETPOS 



Not allowed 



Hot allowed 



None 



Not allowed 



Not allowed 



TABLE a IHPUT/OUTPUT/FORMATTING 



ANSI -77 



CDC 



* for state- * (or state- 
sent label in aient label 
I/O statement 



END-, ERR- 
lOSTAT- 



Not allowed 



FTNl EOF 
lOCHEX 
RUNi EOF.U 
IOC HEX, U 

FTNl No 
RUNi Yes 



READ(..SEC-) REASHS/ 
WRITE (..NEC-) HRITEMS etc. 



OPEN 

CLOSE 

INQUIRE 

N.dBe,etc. 



R£ADBC 
VIRITEC 
HOVLEV 

Ew.dEe, etc. 



FTNl 
RUNi 



READ i HRITB 
to Character 
strings 



* for state- 
ment label 



Not allowed 



FIND 
(a'rt etc. 

REAO(U,ID-W) 
AO<UV) 
DEPINF, etc. 

Qw.d 

E,F ok Cor 

integersi G 

ok integer, 

logical 



REREAD 



ICL 



HONEYWELL 



• for state- * for state- Not allowed 
nent label ment label 



END-, BRR- 



END-, ERR- 
lOSTAT- 



Ai 

Vi 



No 
Yes 



FIND 
(a'cl etc. 



Not allowed 



READ (u, CI, clause) 

list, etc. Like IBM 



Iw.d, Jw, ^S Qw.d, V, (kt.c 
Ew.dEe, etc. C ok for logical 
or integer 



5+ 



Char count 
optional I no. 
of chars 
converted 
available. 
ERR- allowed 



Char count 
not Included. 
ERR- allowed. 



O 

a 



cd 
i 
oa 



In ftagttmi urruRH'STOr 
In subroutine! END • ReruRN 



Mtiriwt* (aturn* ayntu 



CfT 

No 



mM* 



I 

yjuai-B ^ tuBBouTiWES imp ruwcrioMS 
WSI-17 CDC }m 



Ho 



iUIHt Xal/No No 



UMIVfrp 

*lnt*(n«|* 

■ubfoutlnc* 

•llouad 



ICJi IIOMEKWe 

STOP cequUad 



• t>«(0(« Mot allomd 
iDbtl in CALL 

* In mtttt. 



eNTMy k»* own c*Utn9 saqucnc* 

and usabU aa a function 
CDC EHTM stataaanta auat hava 
ooiiact calling laquancaa added. 

EMO atateaent laqulrad 

Oumy atgunent in alaahea 
• call by addfeaa 

*i aubpcogcaa nana* allowed 
Oveday ayntax 



oeriNE uaed in ailtlimttlc 
atataaent functiona 



Mot allowad 



• balora ' I-1-....I * iMfoia 
label in CALL HGTUIIHSI-,-, label in CALL 

* in aubc. ...^ t in aubr. 



Ma 



yea 




yea 


yea 




lllaaal 


Illegal 


Illegal 


Illegal 


>•■ 


Not allowed 


Not allowed 


Not allowed 


Not allowed 


Hi yea 

Gi Ha 


LDR coaaanda 
ROOT, l«WL, 
SOVL 
CALL oveday 


WisiMcKied 


Unapecifled 


OVEItLAy|...| 




Not allowed 


Not allowed 


Mot allowed 


Not allowed 


Allowad 



Alt or I t baCoie I'OKTNAN 11 

befoia label label in CALL syntax uaed 

in CALL. 

Vi Indaalng 

la abaoluta. 

not ralaklva. 



Veai aaaa type yea 
aa function 


yea 


NO yes 


yes 


Ves Vea 


Illegal 


In MTBMIAI. Mot allowed 
only 


Not allowed 


stateaent 





Allowad 



Hot allowsd 



TABLE 10 INTRINSIC OR IHLINE FUMCTIONS 



O 

o 

-4 



I tea 
Memory management 
I/O 
Compiler directives 

Assembler code with FORTRAN 
code insertion 



Actual functions that are 
unusual 



CFT 
None 



COIRS ■ LIST 
NOLI ST, EJECT 
etc. 



Not allowed 
via UPDATE 

None 



ANSI -66 - 

None 

Not allowed 

Not specified 

Not allowed 
Hot allowed 

None 



AN3I-77 



Not allowed 



CDC 

ECS 

Unload 

mil No 
BUNi between 
routines, 
. in column 1 

IDENT 

Via UPDATE 



IBM 
None 
Unload 
Generic 

Not allowed 



SHIET All double 
RUHi FORTRAN precision 
II syntax g prefixes 

on functions 



UNIVAC 
BANK 

Coitpiler 

Not allowed 

Include 
Delete 

All double 
precision 



HONEHWELL 



Double precisian -f 
D,E,Q,R,I,J,K 
prefixes) fatal 
functions I many 
additional such 
as trig with 
degrees . 



CO 
I 



AMSI-6t 



TABLE II EXTERNAL FUNCTIONS 
ANSI-77 CDC IBM 



HONEYWELL 



General differences 



Specific functions 



None 



All double 
precision 



FORTRAN II 

functions I 

LOCF.ABSF, etc 

SLITE 

DISPLA 

TIME 

lOCHEC 



All double 
precision 

GAMMA 
LGAMHA 



All double 
precision 

Time, date 



APPENDIX C 

TRACEBACK 

There is a FORTRAN library routine TRBK (file) that you can call to 
determine how the program reached the current routine. If there is no 
argument, the trace is printed in the logfile, if file is specified, 
•$OUT' perhaps, the traceback goes to that file. 



When a task terminates abnormally, TRBK is automatically called so a 
trace of the subroutine calling tree to the point of error is provided. 
To make this as useful as possible, turn on the block listing from CFT: 

CFT,ON=B, ... 

and the load map on the loader card: LDR,MAP,... so that the addresses 
provided can be easily localized in the FORTRAN source. Below is an 
annotated abort message to illustrate using the trace information. 

ABO 5 3 - FLOATING POINT ERROR 

ABOOO - JOB STEP ABORTED. P = 0144760A 

_ ******** WAS CALLED BY SOLVE AT LOCATION 0144760A 

- SOLVE WAS CALLED BY EQNS AT LOCATION 00 1234 IC 

- EQNS WAS CALLED BY $MAIN AT LOCATION 0003411A 

The cause of termination is a floating point overflow from the first 
line of the abort message. Another common diagnostic is "OPERAND RANGE 
ERROR", which occurs when an attempt is made to reference some part of 
memory outside your user area, most likely a wild index. If all output 
is lost, one possible cause is using block common with a DIMENSION of 1, 
COMMOM X(l), but storing into Xs with subscripts much larger than 1. 
(This may overwrite I/O buffers and tables.) 

The last instruction being executed at the time of abort was at location 
00144760A. For the FORTRAN programmer, discard the parcel, A in this 
case, leaving the OCTAL address of the instruction word. The actual 
error probably occurred on earlier instruction. The load map ADDRESS 
gives the base address of all routines. In this case, find the base 
address of SOLVE, where the error occurred (from the third line of the 
abort message) . Subtract the base address from the absolute P-counter 
address given to find the relative address in SOLVE. (Remember to 
subtract in OCTAL i 1 1 ) 



2240207 C-1 



Because the BLOCK (ON»B) listing was turned on for CFT, you will have a 
list of all blocks which will allow you to find the FORTRAN code block 
where the abort occurred: 

P-Counter 0144760 (A) 

Loader ADDRESS of SOLVE - 0104700 



Relative address in SOLVE 0040060 

Partial block listing for SOLVE: 



SOLVE VECTOR BLOCK BEGINS AT SEQ. NO. 1372, Pa40035B 
SOLVE BLOCK B^ZNS AT SEQ. NO. 1380, P-40Q55D 
SOLVE BLOCK BEGINS AT SEQ NO. 1401, Pa40077A 



Because the relative error address is 40060, which is between. 40055 and 
40077. The bomb occurred for a FORTRAN statement between numbers 1380 
and 1401. The listing should now help you find the probable source of 
the error quickly. 

Similarly, the point at which SOLVE was called by EQNS can be 
determined. The absolute address of the CALL was 0012341C and EQNS was 
called by the main program at absolute locations 0003411A 
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Compiler directive lines 
through 5 and any of the 

DIRECTIVE 

EJECT 

LIST 

NOLIST 

CODE 

NOCODE 

VECTOR 

NOVECTOR 

IVDEP 

INT24 



FLOW 

NOPLOW 

SHED 

NOSCH 

VFUNCTION 

BOUNDS 



APPENDIX D 

COMPILER DIRECTIVES 

begin with characters CDIR$ in columns 1 
directives listed below in columns 7 through 72. 

FUNCTION 

Ejects to top of next page. 

Resumes listable output. 

Suppresses production of listable output. 

Produces code list. 

Suppresses production of CFT-generated code 
lists. 

Enables vector ization of inner DO- loops. 

Suppresses vectorization'of inner DO- loops. 

Ignores vector dependencies in the next 
DO- loops . 

Identifies listed variables and arrays as 
24-bit integers, equivalent to INTEGER *2 
declarative. 

Enables f lowtrace . 

Disables flowtrace. 

Enables the scheduler. 

Disables the scheduler. 

Identifies external vector functions. 

Checks array references for out-of-hand 
subscripts. 
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APPENDIX E 








1 






CHARACTER SETS 








CHAR 


ASCII 


HEX 


ASCII 
CARD CODE 


CHAR 


ASCII 


HEX 


ASCII 
CARD CODE 


NUL 


000 


00 


12-0-9-8-1 


9 


100 


40 


8-4 


SOU 


001 


01 


12-9-1 


A 


101 


41 


12-1 


STX 


002 


02 


12-9-2 


B 


102 


42 


12-2 


ETX 


003 


03 


12-9-3 


C 


103 


43 


12-3 


EOT 


004 


04 


9-7 


D 


104 


44 


12-4 


BfQ 


005 


05 


0-9-8-5 


E 


105 


4 5. 


12-5 


ACK 


006 


06 


0-9-8-6 


F 


106 


46 


12-6 


BEL 


007 


07 


0-9-8-7 


G 


107 


47 


12 -.7 


BS 


010 


08 


11-9-6 


H 


110 


48 


12-8 


HT 


Oil 


09 


12-9-5 


I 


111 


4 9 


12-9 


LF 


012 


OA 


0-9-5 


J 


112 


4 A 


11-1 


VT 


013 


OB 


12-9-8-3 


K 


113 


4B 


11-2 


FF 


014 


OC ' 


12-9-8-4 


L 


114 


4C 


11-3 


CR 


015 


OD 


12-9-8-5 


M 


115 


4D 


11-4 


SO 


016 


OE 


12-9-8-6 


N 


116 


4C 


11-5 


SI 


017 


OF 


12-9-8-7 





117 


4f- 


11-6 


DLE 


020 


10 


12-11-9-8-1 


P 


120 


50 


11-7 


DCl 


021 


II 


11-9-1 


Q 


121 


51 


11-8 


DC2 


022 


12 


11-9-2 


R 


122 


5 2 


11-9 


DC3 


023 


13 


11-9-3 


S 


123 


5 3 


0-2 


DC4 


024 


14 


4-8-9 


T 


124 


54 


0-3 


NAK 


025 


15 


9-8-5 


U 


125 


55 


0-4 


SYN 


026 


16 


9-2 


V 


126 


5(1 


0-5 


tTB 


027 


17 


0-9-6 


w 


127 


5 7 


0-6 


GVM 


030 


18 


11-9-8 


X 


130 


58 


0-7 


EM 


031 


19 


11-9-8-1 


Y 


131 


5 9 


0-8 


SUB 


032 


lA 


9-8-7 


L. 


132 


5 A 


0-9 


F5C 


033 


IB 


0-9-7 


[ 


133 


SB 


12-8-2 


FS 


034 


IC 


11-9-8-4 


\ 


134 


5C 


0-8-2 


GS 


035 


ID 


11-9-8-5 


] 


135 


3D 


11-8-2 


RS 


036 


IE 


11-9-8-6 


~ 


136 


5i; 


11-8-7 


US 


037 


IF 


11-9-8-7 


_ 


137 


51- 


0-8-5 


Space 


040 


20 


None 


^ 


140 


bO 


8-1 
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CHAR 


ASCII 


HEX ASCII 


f CHAR 


ASCII 


HEX 


ASCII 








CARD CODE 


1 






CARD CODE 


t 


041 


:i 


12-8-7 


a 


141 


(>1 


12-0-1 




042 


7 "» 


8-7 


b 


142 


(>: 


12-0-2 


/ 

$ 


043 


13 


8-3 


c 


143 


Oj 


12-0-3 


044 


u 


11-8-3 


d 


144 


M . 


12-0-4 


\ 


04S 


23 


0-8-4 


e 


145 


5 


12-0-5 


S 


046 


2b 


12 


f 


146 


06 


12-0-6 




047 


27 


8-5 


S 


147 


7 


12-0-7 


C 


050 


28 


12-8-5 


h 


150 


OS 


12-0-8 


) 


051 


29 


11-8-5 


i 


151 


oy 


12-0-9 


♦ 


OSZ 


l.\ 


11-8-4 


J 


152 


<iA 


12-11-1 


* 


053 


23 


12-8-6 


k 


153 


OB 


12-U-2 


f 


054 


2C 


0-8-3 


1 


154 


oc 


12-U-3 


* 


OSS 


2U 


11 


n 


155 


on 


12-11-4 


• 


056 


2E 


12-8-3 


a 


156 


01-: 


12-U-S 


/ 


057 


:f 


0-1 





157 


OF 


12-11-6 





060 


30 





P 


160 


"0 


12-U-7 


1 


061 


31 


1 


q 


161 


*i 


12-11-8 


2 


062 


32 


2 


r 


162 


— > 


12-11-9 


3 


063 


33 


3 


's 


163 


^3 


U-0-2 


4 


064 


34 


4 


t 


164 


74 


11-0-3 


5 


065 


35 


5 


u 


165 


'5 


11-0-4 


6 


066 


36 


6 


V 


166 


76 


U-O-S 


7 


067 


37 


7 


w 


167 


77 


11-0-6 


8 


070 


38 


3 


X 


170 


73 


11-0-7 


9 


071 


39 


9 


y 


171 


79 


11-0-8 


■ 


072 


3 A 


8-2 


z 


172 


7\ 


11-0-9 


1 


073 


3B 


11-8-6 


! 


173 


-\\ 


12-0 


< 


074 


3C 


12-8-4 


1 

1 


174 


7C 


12-11 


s 


075 


31) 


8-6 


1 


175 


:d 


11-0 


> 


076 


3r: 


0-8-6 


•>. 


176 


71: 


11-0-1- 


? 


077 


3F 


0-8-7 


DEL 


177 


7V 


12'-g-7 



2240207 



2-2 



